feat(disruption): disk full injection.#1058
Conversation
d238abd to
6109573
Compare
🎉 All green!🧪 All tests passed 🎯 Code Coverage (details) 🔗 Commit SHA: 02d1b48 | Docs | Datadog PR Page | Give us feedback! |
|
The diskFull disruption creates a ballast file on the host filesystem via the injector pod. The injector pod mounts the host root at /mnt/host, but that mount has ReadOnly: true — which was correct for all existing injectors (network, CPU, etc.) that only read the host. diskFull must write to the host, so it gets read-only file system ENOSPC before even starting. Root cause
{
Name: "host",
MountPath: "/mnt/host",
ReadOnly: true, // ← must be false for diskFull
},Fix
File: Change signature: func (m *chaosPodService) generateChaosPodSpec(..., hostWritable bool) corev1.PodSpec {Inside the function, use the parameter:
File: Spec: m.generateChaosPodSpec(
targetNodeName,
terminationGracePeriod,
activeDeadlineSeconds,
args,
hostPathDirectory,
hostPathFile,
kind == chaostypes.DisruptionKindDiskFull, // hostWritable
), |
|
Many thanks for the deep investigation. I will fix that ASAP. I still have concerns about allowing write to a complete FS for writing a ballast in a dedicated directory. It will allow someone with access to the pod to alter the disrupted pod/node for purposes other than the expected disruption. I will propose a security gate. |
|
Could you also create an example file to test locally the disruption:
# Unless explicitly stated otherwise all files in this repository are licensed
# under the Apache License Version 2.0.
# This product includes software developed at Datadog (https://www.datadoghq.com/).
# Copyright 2026 Datadog, Inc.
apiVersion: chaos.datadoghq.com/v1beta1
kind: Disruption
metadata:
name: disk-full
namespace: chaos-demo
spec:
level: pod
selector:
service: demo-curl
count: 1
duration: 10m
diskFull:
path: "/mnt/data"
capacity: "95%" |
|
Could you also update the |
|
Could you also update the |
Signed-off-by: Thibault NORMAND <thibault.normand@datadoghq.com> Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com> Signed-off-by: Thibault NORMAND <me@zenithar.org>
… address PR comments. Add diskFull to 5 missing registration points in validateGlobalDisruptionScope (at-least-one-kind check, ContainerFailure/NodeFailure/PodReplacement compatibility, OnInit compatibility), DisruptionCount(), and Explain(). Add writable shadow mount for the target path in chaos pod spec so the injector can write ballast files while keeping /mnt/host read-only. Add capacity mode test coverage, disk_full example, complete.yaml entry, and docs/README.md link. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
fb46e35 to
6a5b614
Compare
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
|
Full review comments:
|
@aymericDD, it would be interesting to have configuration validation tests. Because most of the configuration files acting as examples are not validated in CI. That would help to spot such discrepancies, especially when we introduce new feature or schema breaking changes. |
1ad02eb to
02d1b48
Compare
What does this PR do?
Adds a new
diskFulldisruption kind that genuinely fills a target pod volume using thefallocate(2)syscall, causing real ENOSPC errors on all subsequent write operations. This fills a gap where existing disruptions (DiskPressure = I/O throttling, DiskFailure = eBPF onopenatonly) don't simulate actual disk exhaustion visible to monitoring and all syscalls.Features
fallocate(2)syscall (instant, O(1) on ext4/xfs) to genuinely consume disk space. Falls back to writing zeros on unsupported filesystems.unsafeMode.allowDiskFullNoFloor). Pod-level only. Webhook warning for ephemeral-storage eviction risk.fallocate/package (adapted from detailyang/go-fallocate, MIT) — no dependency onfallocateorddbinaries in the injector image.How it differs from existing disruptions
df/monitoring?openatonlyExample
Code Quality Checklist
Testing
unittests.Test coverage
Files changed (24 files, ~1350 lines)
api/v1beta1/disk_full.go,disruption_types.go,disruption_webhook.go,safemode.goinjector/disk_full.go(ballast file via fallocate)cli/injector/disk_full.go,cli/injector/main.gofallocate/(4 platform-specific files, adapted from go-fallocate MIT)safemode/safemode_disk_full.go,safemode/safemode.gotypes/types.go(DisruptionKindDiskFull)docs/disk_full.md,docs/disruption_catalogue.mdapi/v1beta1/disk_full_test.go,injector/disk_full_test.go